Search results for "004 Informatik"
showing 10 items of 10 documents
NightShift: NMR shift inference by general hybrid model training - a framework for NMR chemical shift prediction
2013
Computing Euclidean Steiner trees over segments
2020
In the classical Euclidean Steiner minimum tree (SMT) problem, we are given a set of points in the Euclidean plane and we are supposed to find the minimum length tree that connects all these points, allowing the addition of arbitrary additional points. We investigate the variant of the problem where the input is a set of line segments. We allow these segments to have length 0, i.e., they are points and hence we generalize the classical problem. Furthermore, they are allowed to intersect such that we can model polygonal input. As in the GeoSteiner approach of Juhl et al. (Math Program Comput 10(2):487–532, 2018) for the classical case, we use a two-phase approach where we construct a superse…
Rule Extraction From Binary Neural Networks With Convolutional Rules for Model Validation.
2020
Classification approaches that allow to extract logical rules such as decision trees are often considered to be more interpretable than neural networks. Also, logical rules are comparatively easy to verify with any possible input. This is an important part in systems that aim to ensure correct operation of a given model. However, for high-dimensional input data such as images, the individual symbols, i.e. pixels, are not easily interpretable. Therefore, rule-based approaches are not typically used for this kind of high-dimensional data. We introduce the concept of first-order convolutional rules, which are logical rules that can be extracted using a convolutional neural network (CNN), and w…
HECTOR : a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data
2014
Background Current-generation sequencing technologies are able to produce low-cost, high-throughput reads. However, the produced reads are imperfect and may contain various sequencing errors. Although many error correction methods have been developed in recent years, none explicitly targets homopolymer-length errors in the 454 sequencing reads. Results We present HECTOR, a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data. In this algorithm, for the first time we have investigated a novel homopolymer spectrum based approach to handle homopolymer insertions or deletions, which are the dominant sequencing errors in 454 pyrosequencing reads. We have evaluat…
Studying the evolution of neural activation patterns during training of feed-forward ReLU networks
2021
The ability of deep neural networks to form powerful emergent representations of complex statistical patterns in data is as remarkable as imperfectly understood. For deep ReLU networks, these are encoded in the mixed discrete–continuous structure of linear weight matrices and non-linear binary activations. Our article develops a new technique for instrumenting such networks to efficiently record activation statistics, such as information content (entropy) and similarity of patterns, in real-world training runs. We then study the evolution of activation patterns during training for networks of different architecture using different training and initialization strategies. As a result, we see …
Parallel and scalable short-read alignment on multi-core clusters using UPC++
2016
[Abstract]: The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown to obtain high quality mappings at the expense of long execution times. Finding fast yet accurate software solutions is of high importance to research, since availability and size of NGS datasets continue to increase. In this work we present an efficient parallelization approach for NGS short-read alignment on multi-core clusters. Our approach takes advantage of a distributed shared memory programming model based on the new UPC++ language. Experimental results using the …
Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability
2016
CheS-Mapper - Chemical Space Mapping and Visualization in 3D
2012
Abstract Analyzing chemical datasets is a challenging task for scientific researchers in the field of chemoinformatics. It is important, yet difficult to understand the relationship between the structure of chemical compounds, their physico-chemical properties, and biological or toxic effects. To that respect, visualization tools can help to better comprehend the underlying correlations. Our recently developed 3D molecular viewer CheS-Mapper (Chemical Space Mapper) divides large datasets into clusters of similar compounds and consequently arranges them in 3D space, such that their spatial proximity reflects their similarity. The user can indirectly determine similarity, by selecting which f…
CUSHAW3: Sensitive and Accurate Base-Space and Color-Space Short-Read Alignment with Hybrid Seeding
2014
The majority of next-generation sequencing short-reads can be properly aligned by leading aligners at high speed. However, the alignment quality can still be further improved, since usually not all reads can be correctly aligned to large genomes, such as the human genome, even for simulated data. Moreover, even slight improvements in this area are important but challenging, and usually require significantly more computational endeavor. In this paper, we present CUSHAW3, an open-source parallelized, sensitive and accurate short-read aligner for both base-space and color-space sequences. In this aligner, we have investigated a hybrid seeding approach to improve alignment quality, which incorp…
Improving structural similarity based virtual screening using background knowledge
2013
Background Virtual screening in the form of similarity rankings is often applied in the early drug discovery process to rank and prioritize compounds from a database. This similarity ranking can be achieved with structural similarity measures. However, their general nature can lead to insufficient performance in some application cases. In this paper, we provide a link between ranking-based virtual screening and fragment-based data mining methods. The inclusion of binding-relevant background knowledge into a structural similarity measure improves the quality of the similarity rankings. This background knowledge in the form of binding relevant substructures can either be derived by hand selec…